College Basketball Analysis 2015 - 2019

Mehrabi Hasan

Description: In this Analysis we will be taking a look into College Basketball to derive useful insights and predictions that could be used for organizations or Enthusiasts

In [1]:
# Package Imporation
import numpy as np 
import pandas as pd 
In [2]:
## Reading into Files into DataFrames
cbb = pd.read_csv('cbb.csv')
cbb.head(5)
Out[2]:
TEAM CONF G W ADJOE ADJDE BARTHAG EFG_O EFG_D TOR ... FTRD 2P_O 2P_D 3P_O 3P_D ADJ_T WAB POSTSEASON SEED YEAR
0 North Carolina ACC 40 33 123.3 94.9 0.9531 52.6 48.1 15.4 ... 30.4 53.9 44.6 32.7 36.2 71.7 8.6 2ND 1.0 2016
1 Wisconsin B10 40 36 129.1 93.6 0.9758 54.8 47.7 12.4 ... 22.4 54.8 44.7 36.5 37.5 59.3 11.3 2ND 1.0 2015
2 Michigan B10 40 33 114.4 90.4 0.9375 53.9 47.7 14.0 ... 30.0 54.7 46.8 35.2 33.2 65.9 6.9 2ND 3.0 2018
3 Texas Tech B12 38 31 115.2 85.2 0.9696 53.5 43.0 17.7 ... 36.6 52.8 41.9 36.5 29.7 67.5 7.0 2ND 3.0 2019
4 Gonzaga WCC 39 37 117.8 86.3 0.9728 56.6 41.1 16.2 ... 26.9 56.3 40.0 38.2 29.0 71.5 7.7 2ND 1.0 2017

5 rows × 24 columns

In [3]:
#### Lets try to understand some of our data more closely 
print(f' The shape of the dataframe is {cbb.shape}')
 The shape of the dataframe is (1757, 24)
In [4]:
cbb.groupby('CONF').mean()
Out[4]:
G W ADJOE ADJDE BARTHAG EFG_O EFG_D TOR TORD ORB ... FTR FTRD 2P_O 2P_D 3P_O 3P_D ADJ_T WAB SEED YEAR
CONF
A10 32.642857 17.385714 105.055714 101.025714 0.594563 50.042857 49.657143 18.022857 18.638571 28.500000 ... 34.832857 34.578571 49.730000 48.798571 33.720000 34.114286 68.122857 -6.578571 9.428571 2017.000000
ACC 34.306667 21.093333 112.461333 96.256000 0.820261 51.266667 48.521333 17.376000 18.249333 31.594667 ... 34.802667 31.110667 50.484000 47.445333 35.080000 33.574667 67.813333 0.937333 4.789474 2017.000000
AE 31.044444 15.022222 98.824444 106.511111 0.329853 49.702222 51.171111 18.984444 18.533333 27.120000 ... 33.664444 34.155556 48.540000 50.326667 34.353333 35.115556 67.880000 -11.546667 13.800000 2017.000000
ASun 29.925000 14.175000 100.660000 108.597500 0.316492 50.440000 52.105000 18.455000 17.570000 28.315000 ... 32.687500 34.145000 49.482500 51.665000 34.837500 35.212500 69.862500 -11.585000 15.250000 2017.000000
Amer 32.912281 18.526316 106.005263 99.294737 0.644409 49.424561 47.896491 18.482456 18.550877 31.866667 ... 34.905263 32.878947 48.531579 46.514035 33.982456 33.457895 67.498246 -4.473684 7.133333 2017.052632
B10 34.014286 20.400000 110.994286 96.442857 0.800939 51.025714 48.352857 17.434286 18.025714 30.638571 ... 33.670000 31.201429 49.911429 46.784286 35.281429 34.210000 67.482857 -0.092857 6.030303 2017.000000
B12 34.400000 21.340000 112.966000 95.028000 0.860170 51.178000 48.502000 18.240000 19.542000 32.434000 ... 35.920000 34.298000 50.102000 47.012000 35.464000 34.162000 68.260000 2.236000 5.363636 2017.000000
BE 33.840000 20.420000 111.898000 97.270000 0.803320 52.018000 49.736000 17.864000 18.826000 29.758000 ... 35.818000 33.886000 51.276000 48.856000 35.536000 34.138000 69.000000 0.212000 6.500000 2017.000000
BSky 30.033333 14.433333 101.470000 108.355000 0.338455 51.083333 52.585000 18.038333 17.700000 26.950000 ... 35.791667 37.326667 49.451667 51.480000 35.900000 36.428333 68.996667 -11.686667 14.400000 2017.000000
BSth 29.814815 15.222222 100.416667 107.620370 0.340306 50.522222 51.577778 18.912963 18.396296 27.579630 ... 33.453704 35.140741 49.318519 50.731481 34.977778 35.253704 67.714815 -10.955556 14.666667 2017.018519
BW 30.355556 14.888889 100.935556 104.082222 0.424398 48.888889 49.737778 18.722222 18.068889 28.528889 ... 34.904444 36.466667 47.575556 48.277778 34.244444 35.140000 68.071111 -9.993333 14.000000 2017.000000
CAA 31.240000 16.280000 104.278000 105.866000 0.459290 51.030000 51.276000 17.562000 16.886000 28.736000 ... 35.484000 34.726000 50.134000 50.374000 35.056000 35.224000 67.808000 -9.416000 13.000000 2017.000000
CUSA 31.200000 16.228571 101.960000 104.081429 0.448834 49.817143 50.250000 18.382857 18.354286 28.245714 ... 34.482857 34.314286 49.034286 49.190000 34.102857 34.692857 68.661429 -9.595714 13.600000 2017.000000
Horz 31.265306 15.489796 102.373469 105.159184 0.432371 50.151020 51.151020 18.469388 18.326531 28.828571 ... 34.055102 35.826531 48.928571 50.251020 34.830612 35.338776 69.514286 -10.183673 14.000000 2017.040816
Ind 29.000000 17.000000 103.300000 104.500000 0.464400 52.300000 46.400000 20.500000 19.000000 28.800000 ... 39.000000 44.100000 49.800000 44.600000 37.100000 33.100000 67.100000 -6.200000 NaN 2015.000000
Ivy 28.225000 14.450000 102.392500 103.647500 0.467645 50.992500 50.302500 19.082500 18.137500 27.047500 ... 33.402500 33.940000 50.117500 49.492500 34.957500 34.505000 68.125000 -8.422500 13.400000 2017.000000
MAAC 32.418182 15.018182 100.647273 105.816364 0.370964 49.716364 50.427273 19.163636 18.954545 29.294545 ... 35.629091 36.825455 48.394545 49.018182 34.641818 35.209091 68.687273 -11.841818 14.800000 2017.000000
MAC 31.483333 17.650000 104.356667 103.518333 0.519375 50.396667 50.246667 18.008333 18.126667 29.791667 ... 35.436667 34.895000 49.523333 49.410000 34.513333 34.445000 68.790000 -7.908333 11.800000 2017.000000
MEAC 30.500000 11.953125 94.231250 110.796875 0.158767 46.782812 51.050000 19.996875 19.203125 30.054687 ... 35.365625 38.525000 46.225000 50.681250 32.020313 34.426563 69.115625 -14.832812 16.000000 2016.968750
MVC 31.840000 17.300000 103.056000 100.454000 0.552196 49.888000 49.400000 18.922000 19.068000 26.700000 ... 35.304000 34.824000 48.400000 48.004000 35.082000 34.634000 66.930000 -7.394000 10.000000 2017.000000
MWC 31.618182 17.181818 104.558182 101.721818 0.571245 50.274545 49.698182 18.121818 17.934545 27.903636 ... 37.229091 34.667273 49.360000 48.645455 34.463636 34.398182 68.545455 -6.943636 10.000000 2017.000000
NEC 30.940000 13.800000 97.586000 108.532000 0.244788 48.412000 50.956000 19.488000 19.232000 30.170000 ... 34.022000 36.396000 47.520000 50.276000 33.360000 34.792000 68.630000 -13.310000 16.000000 2017.000000
OVC 29.966667 14.900000 101.610000 106.490000 0.380663 50.336667 51.751667 19.190000 19.165000 29.588333 ... 34.953333 36.970000 49.548333 50.796667 34.605000 35.591667 68.660000 -10.333333 13.500000 2017.000000
P12 33.316667 19.333333 109.091667 99.128333 0.720798 51.420000 49.418333 17.983333 17.770000 30.253333 ... 36.096667 33.306667 50.448333 48.006667 35.528333 34.751667 68.836667 -2.665000 6.666667 2017.000000
Pat 31.120000 14.780000 100.404000 106.542000 0.345906 51.212000 51.742000 19.096000 18.660000 26.660000 ... 32.880000 34.426000 50.212000 50.792000 35.210000 35.560000 67.752000 -11.532000 14.800000 2017.000000
SB 30.931034 16.293103 102.282759 104.020690 0.453569 49.589655 49.941379 18.805172 18.896552 29.851724 ... 35.946552 36.748276 48.917241 49.244828 33.753448 34.136207 68.706897 -9.548276 14.000000 2017.051724
SC 30.640000 15.940000 102.832000 106.238000 0.421926 51.172000 52.050000 19.116000 18.994000 29.492000 ... 33.670000 36.498000 50.024000 51.392000 35.340000 35.380000 68.722000 -9.496000 11.400000 2017.000000
SEC 33.714286 19.642857 110.310000 96.712857 0.790094 50.195714 48.250000 18.350000 18.900000 31.970000 ... 37.907143 36.254286 49.378571 46.891429 34.394286 33.724286 68.628571 -0.764286 6.000000 2017.000000
SWAC 31.040000 11.860000 93.278000 110.490000 0.152550 46.142000 50.872000 20.346000 19.768000 30.708000 ... 36.422000 41.470000 45.572000 50.726000 31.612000 34.128000 68.498000 -14.910000 15.800000 2017.000000
Slnd 28.307692 13.953846 98.675385 108.643077 0.274008 49.449231 51.964615 20.326154 19.975385 30.113846 ... 37.576923 39.992308 48.444615 51.090769 34.186154 35.724615 69.392308 -11.376923 14.200000 2017.000000
Sum 29.744186 15.511628 103.495349 106.390698 0.426023 51.597674 51.948837 17.648837 17.093023 26.172093 ... 33.441860 34.027907 49.676744 50.165116 36.627907 36.783721 68.918605 -9.327907 14.200000 2016.930233
WAC 30.292683 15.439024 99.575610 104.746341 0.393039 48.507317 49.812195 19.480488 18.917073 29.648780 ... 35.846341 38.521951 47.570732 48.858537 33.485366 34.297561 68.587805 -9.919512 13.600000 2017.048780
WCC 31.940000 17.560000 106.168000 101.818000 0.576142 51.074000 50.164000 17.716000 17.566000 28.452000 ... 34.858000 35.064000 49.760000 49.098000 35.606000 34.708000 67.506000 -6.136000 6.000000 2017.000000

33 rows × 21 columns

In [5]:
#### Lets see if we can utilize pandas-profiling to derive an initial correlation for seed placements
import pandas_profiling  
cbb.profile_report(style={'full_width':True})
Out[5]:

Through pandas-profiling we are able to see a few interesting relationships

  1. We are able to see that seed rankings are heavily correlated with ADJDE (Adjusted Defensive Efficiency)
In [6]:
### Lets explore our Dataset to see if we can find the top teams for ADE, AOE, Wins Overall


cbb.columns = map(str.lower, cbb.columns) #lowercase all the column names
cbb.groupby('team')['w'].sum().sort_values(ascending = False).head(10)
Out[6]:
team
Gonzaga           163
Villanova         162
Kentucky          153
Duke              149
Virginia          148
Kansas            147
North Carolina    147
Michigan St.      138
Oregon            136
Arizona           135
Name: w, dtype: int64
In [7]:
cbb.groupby('conf')['w'].sum().sort_values(ascending = False).head(10)
Out[7]:
conf
ACC     1582
B10     1428
SEC     1375
A10     1217
P12     1160
CUSA    1136
B12     1067
MAC     1059
Amer    1056
BE      1021
Name: w, dtype: int64
In [36]:
cbb[['team','conf','adjoe']].sort_values(by = ['adjoe'],ascending = False).head(10)
Out[36]:
team conf adjoe
1 Wisconsin B10 129.1
9 Villanova BE 128.4
1585 Oklahoma St. B12 126.8
11 Notre Dame ACC 125.3
5 Duke ACC 125.2
29 Gonzaga WCC 123.4
0 North Carolina ACC 123.3
1730 Michigan B10 123.3
1733 Purdue B10 123.2
1528 Kentucky SEC 123.2
In [37]:
cbb[['team','conf','adjde']].sort_values(by = ['adjde'],ascending = False).head(10)
Out[37]:
team conf adjde
798 North Carolina A&T MEAC 124.0
1352 Alabama A&M SWAC 123.6
242 USC Upstate ASun 123.0
1154 The Citadel SC 120.0
940 Bryant NEC 119.8
1180 Samford SC 119.4
806 Howard MEAC 119.4
803 Delaware St. MEAC 119.2
1286 Incarnate Word Slnd 119.2
812 South Carolina St. MEAC 119.2
In [10]:
cbb.columns
Out[10]:
Index(['team', 'conf', 'g', 'w', 'adjoe', 'adjde', 'barthag', 'efg_o', 'efg_d',
       'tor', 'tord', 'orb', 'drb', 'ftr', 'ftrd', '2p_o', '2p_d', '3p_o',
       '3p_d', 'adj_t', 'wab', 'postseason', 'seed', 'year'],
      dtype='object')
In [13]:
#### Interesting Enough we always see that the ACC and Big10 Triamph over other conferences
#### Lets see if we can visualize this relationship
from plotly.offline import download_plotlyjs, init_notebook_mode, iplot
from plotly.graph_objs import *
init_notebook_mode()

import matplotlib as plt
import seaborn as sns
import plotly.express as px
%matplotlib inline

fig = px.scatter(cbb,x = 'team',y = 'w',color = 'adjde')
fig.show()
In [21]:
px.scatter_3d(cbb, x = 'adjde',y = 'adjoe',z = 'barthag',color = 'w',title = '3d plot of Adjusted Defensive, Adjusted Offensive, and Power Ranking colored by wins')
In [44]:
#This is interesting some things that we can see is that seemingly offensive is better than defensive efficiency
first_seed = cbb[cbb[['postseason']].values == 'Champions']
first_seed.mean()
Out[44]:
g            39.20000
w            34.80000
adjoe       124.14000
adjde        91.40000
barthag       0.97086
efg_o        55.82000
efg_d        46.90000
tor          15.70000
tord         18.70000
orb          33.06000
drb          27.42000
ftr          33.34000
ftrd         27.70000
2p_o         55.16000
2p_d         46.28000
3p_o         38.00000
3p_d         31.96000
adj_t        67.24000
wab           9.94000
seed          1.20000
year       2017.00000
dtype: float64
In [ ]:
d
In [ ]:
 
In [ ]:
 
In [ ]:
 
In [ ]:
 
In [ ]:
 
In [ ]:
 
In [ ]:
 
In [ ]:
 
In [ ]: